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ABSTRACT 

The identification and grading of adverse events (AEs) 
during the conduct of clinical trials is a labor-intensive 
and error-prone process. This paper describes and 
evaluates a software tool developed by City of Hope to 
automate complex algorithms to assess laboratory 
results and identify and grade AEs. We compared AEs 
identified by the automated system with those previously 
assessed manually, to evaluate missed/misgraded AEs. 
We also conducted a prospective paired time 
assessment of automated versus manual AE 
assessment. We found a substantial improvement in 
accuracy/completeness with the automated grading tool, 
which identified an additional 17% of severe grade 3—4 
AEs that had been missed/misgraded manually. The 
automated system also provided an average time saving 
of 5.5 min per treatment course. With 400 ongoing 
treatment trials at City of Hope and an average of 1800 
laboratory results requiring assessment per study, the 
implications of these findings for patient safety are 
enormous. 
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INTRODUCTION 

Patient safety is of major concern during the 
conduct of clinical trials, where experimental and 
potentially toxic therapies are evaluated in 
humans. 1 Complete adverse event (AE) reporting 
during trial conduct imposes a large burden and 
presents a major challenge, requiring multiple 
assessments over time, for every treatment course 
for each participant. 2-4 Chart review to assess the 
presence and severity of AEs is expensive, inefficient, 
and imperfect. 5 6 Problems include under-reporting 
of low grade/recurrent AEs, and inconsistent or 
incomplete characterization and reporting of 
high grade AEs. 7 Without accurate AE reporting, 
treatments may appear less toxic than they are, 
potentially endangering patients. 8 

Approximately 30% of more than 100 000 clinical 
trials registered on the http://ClinicalTrials.gov/ 
website involve cancer. To assess AEs in oncology, 
the National Cancer Institute (NCI) developed the 
Common Terminology Criteria for Adverse Events 
(CTCAE), 8 9 a graduated scale for evaluating the 
severity of ~350 qualitative and quantitative AEs, 
from grade 'V (least severe) to '4' (most severe), 
with grade '5' signifying AE-related death. Approx- 
imately 13% of the CTCAE is based on laboratory 
results, accounting for a significant number of 
reportable AEs (see figure 1 for examples) . 

This critical need to accurately and efficiently 
assess large quantities of laboratory-based AEs 
provides a prime opportunity to apply automated 



decision support to reduce errors in transcription, 
calculation, and interpretation. However, to date 
development of such applications is lagging due to 
barriers such as organizational issues, inadequate 
design, poor system performance, non-standard 
terminology/clinical documentation, and lack of 
demonstrable system value. 10-13 As Bates et al 
state, 'information technology has been viewed as 
a commodity, like plumbing, rather than as a stra- 
tegic resource that is vitally important to the 
delivery of care.' 14 Herein we report on a strategic 
decision support tool developed at City of Hope 
(COH) to improve subject safety, and our evalua- 
tion of this tool's utility and value. 

As a NCI-funded Comprehensive Cancer Center, 
COH conducts ~400 clinical trials each year, 
enrolling over 1500 patients annually. Recognizing 
the enormous safety challenges created by this 
volume, in 2005 the COH Department of Infor- 
mation Sciences developed a software tool to 
automate detection of laboratory-based AEs. This 
decision support tool instantaneously assesses 
hundreds of electronic laboratory results to detect 
any abnormal findings, and grades AE severity 
according to CTCAE algorithms. While detecting 
abnormal laboratory results has been an infor- 
matics staple for many years, 15-17 applying deci- 
sion support to invoke the complex CTCAE 
algorithms to automatically grade AEs represents 
a novel application. 

COH Clinical Research Associates (CRAs) have 
assessed over 1 million laboratory results using our 
automated grading tool to date. Recognizing the 
potential value to other institutions, COH devel- 
oped an open source version, the Cancer Auto- 
mated Lab-based Adverse Event Grading Service 
(CALAEGS). While experientially we believed this 
tool greatly enhanced the validity and efficiency of 
laboratory-based AE grading, a formal evaluation 
was required to confirm this impression. This paper 
describes our evaluation of CALAEGS, to our 
knowledge the first open source tool to assist with 
the complex task of grading laboratory data to 
ensure patient safety. 

METHODS 

CALAEGS intakes electronic laboratory data, and 
provides grading results through a web-based user 
interface, web services, and/or a Java API (applica- 
tion programming interface). The user interface 
allows institutions to customize the system to 
their specific data source formats and coding. The 
system is installed behind an institution's firewall 
to avoid confidentiality issues. Laboratory data can 
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BLOOD/BONE MARROW 




Grade 


Adverse Event 


Short Name 


1 


2 


3 


4 


5 


Hemoglobin 


Hemoglobin 


<LLN - 10 0 g/dL 
<LLN - 6.2 mmol'L 
<LLN-100g/L 


<10.0-80g/dL 
<6.2 -4.9 mmol'L 
<!00-80g/L 


<8 0 - 6 5 g/dL 
<4 9 - 4 0 mmol/L 
<80 - 65 g/L 


<6.5 g/dL 
<4.0 mmol/L 
<65 g/L 


Death 


Leukocytes (total WBC) 


Leukocytes 


<LLN - 3000/mm J 
<LLN-3.0x 10' /L 


<3000-2000/mm J 
<3.0-2.0x 10* /L 


<2000- 1000/mm 5 
«2 0- 1 Ox 10* /L 


< 1000/mm 3 
<1.0x 10' /L 


Death 


METABOLIC/LABORATORY 




Grade 


Adverse Event 


Short Name 


1 


2 


3 


4 


5 


Lipase 


Lipase 


>ULN - 1.5 x ULN 


>1.5-2.0xULN 


>2 0 - 5 0 x ULN 


»5.0xULN 




Magnesium, serum-high 
(hypermagnesemia) 


Hypermagnesemia 


>ULN - 3.0 mg/dL 
>ULN- 1.23 mmol/L 




>3 0 - 8 0 mg/dL 
>1 23 -3 30 mmol/L 


>8.0 mg/dL 
>3.30 mmol/L 


Death 


Magnesium, serum-low 
(Hypomagnesemia) 


Hypomagnesemia 


<LLN - 1 .2 mg/dL 
<LLN - 0.5 mmol/L 


<1.2- 0.9 mg/dL 
<0.5 - 0.4 mmol/L 


<0 9 - 0 7 mg/dL 
<0 4 - 0 3 mmol/L 


<0.7 mg/dL 
<0.3 mmol/L 


Death 



Figure 1 Example of laboratory-based adverse event (AE) grading algorithms for two CTCAE V.3.0 organ systems: blood/bone marrow and 
metabolic/laboratory. LLN, lower limit of normal; ULN, upper limit of normal; WBC, white blood cell. 



be submitted as comma-separated values, Extensible Markup 
Language (XML), or Health Level Seven (HL7) version 3 
messages. Grading results are returned in a machine readable 
format compatible with the original input format, and as 
a human-consumable flowsheet rendered via Portable Document 
Format (PDF) (see figure 2). 

CALAEGS incorporates national standards such as the 
Biomedical Research Information Domain Group (BRIDG) 
model 18 and Unified Code for Units of Measure (UCUM), 19 and 
is certified as bronze-level compatible with NCI's Cancer 



Biomedical Informatics Grid (caBIG ). It runs on Java 1.5+ in 
a J2EE web container (Tomcat 5.0+ and JBoss 4.0.5+) and 
requires a MySQL 5.0+ database. 

CALAEGS assesses 39 laboratory-based AE terms based on 
NCI CTCAE version 3.0 9 (refer to table 2). The grading algo- 
rithms received thorough testing across several phases, including 
unit, integration, system, and regression testing. The test 
approach included a range of conditions, including grade 
boundaries, simple and complex assessments, and fail condi- 
tions. CALAEGS assessments are considered preliminary only, as 



3 1 dbordtory Result Data Input - Microsoft Internet Explorer 



Cancer 

Automated 

Laboratory 

Adverse 

Event 

Grading 

Service 



Disclaimer : This pilot version of CALAEGS ts a publicly hosted, 
non-secure website. Do not enter or submit any Protected 
Health Information (PHI)- The appbeatron t$ currently under 
development, tts validity ts being, studied and it should not be 
used for independent adverse event grading until further 
notice. Results of this review and feedback obtained during 
testing will be shared with participating sites. 



AE Calculator Assess Lab Data File Manage Lab Data Mapping Submit Feedback Report 




Lab-based AE Assessment Report 

(based on CTCAE v3) 



Patient Name: Man. Any 
Patient ID: ABC123 
Date Range: 9/2/2007-9/2/2007 



Disclaimer The or 
stocked and it shot 
further nonce Res 
<.i\.ir.«d with partKi 
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03 4500 \ 






Creatine 
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1 

(40-13% 


1 
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Crejlinine 


1 

(55-951 
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03 45 00 \ 


09 30O>| 
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Figure 2 CALAEGS screenshots showing the entry screen for assessing a single laboratory result, for example, from an outside laboratory with no 
electronic file available (left), and the flowsheet generated to grade multiple laboratory-based adverse events (AEs) imported from an electronic file 
(right). CALAEGS, Cancer Automated Lab-based Adverse Event Grading Service. 
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Table 1 Protocols for comparing manual versus automated laboratory- 
based adverse event (AE) grading 

Number of COH Number of laboratory 
Study number Study phase patients graded results evaluated 



Hematologic protocols 



1 Pilot 


8 


9775 


2 Pilot 


5 


5117 


3 l/ll 


1 


614 


Subtotal 


14 


15 506 


Solid tumor protocols 






4 1 


13 


1382 


5 1 


3 


379 


6 1 


1 


313 


7 1 


2 


150 


8 1 


2 


65 


9 II 


4 


542 


10 II 


1 


266 


Subtotal 


26 


3097 


Total 


40 


18 603 



COH, City of Hope. 



some laboratory-based AE grades depend on human judgment as 
well, such as knowledge of additional patient conditions (eg, 
concomitant life-threatening consequences). 

In a paired retrospective study design, we compared the 
accuracy and completeness of AE data graded manually, prior to 
the availability of the automated tool, with results reassessed via 
CAIAEGS. We evaluated 10 sequential in-house therapeutic trials 
of varying size, diagnoses, and phase, from the time frame just 
prior to implementing our automated grading service, to mini- 
mize confounding factors (eg, CPA expertise). These 10 trials 
encompassed 40 patients and 18603 laboratory results (table 1). 

The 18 603 laboratory results were read into CAIAEGS, and 
the automated results compared with manually graded results 
recorded in our clinical trials system. Discrepancies were cate- 
gorized as missed AEs (true AEs that were not identified) or 
misgraded AEs (AEs with an incorrect numeric grade or direc- 
tion, ie, hypo- vs hyper-). All discordant results were reviewed 
by our CA experts to verify that each suspected discrepancy was 
a true error, eliminating any protocol-specific exceptions (eg, if 
the study only requires recording the highest grade per course.) 

To quantify AE grading efficiency, we conducted a prospective 
paired evaluation comparing time required for manual versus 
automated AE grading. In timed sessions, four CRAs graded five 
patients each from their current protocol portfolio, first manually 
and then 2—4 weeks later utilizing the CAIAEGS tool, yielding 
20 paired assessments. The assessment sequence was fixed 
(manual followed by automated), as if CAIAEGS was run first, 
familiarity with the resulting AEs might have increased CPA 
efficiency when re-grading AEs manually. 

A protocol specifying the design and regulatory processes for 
this evaluation was approved by the COH Institutional Review 
Board. The protocol stipulated that the Principal Investigator 
and biostatistician for studies evaluated were to be notified of 
any grading discrepancies identified; if any serious consequences 
were identified, the IRB and appropriate regulatory agencies 
would be notified as well. Analyses were conducted using SAS 
software version 9.1 (SAS Institute). 

RESULTS 

From the 18 603 laboratory results, 643 true AEs were detected. 
No valid AEs identified manually were missed by the automated 
system, and review of all 643 AEs by our OA experts verified 



Table 2 Comparison of laboratory-based adverse events (AEs) 
detected by manual versus automated grading method by AE term for 
643 true AEs* 

AEs correctly AEs missed AEs 
True detected by manual misgraded 

AE term AEs manually method manually 



Hematologic laboratory results 



nci i luyiuui ii 


47 


43 


2 


2 


LcUMJLy Leo 


47 


40 


5 


2 


RIoiitrnnHilc 
IM cU LI UfJI II lo 


90. 


LL 


p 

0 




Platplpte 


48 


46 


1 


1 


PTT* 


g 




] 


o 


OUU LU Lai 


Ml 


1 56 


15 




ChGmistry laboratory rGsults 










AriHnQi^/alkaln^i^ 

nLIUUolo/ ullxQIUolo 


5 


5 


o 


o 


Alkalino phosphataso 


28 


26 


1 


1 


ALT* 


38 


34 


1 


3 


A mwlacp 
Illy IQoC 


1 


1 


o 


n 


AST* 


58 


52 


3 


3 


Rirarhnnatp <ipriim Inuu 

UIUGIUUIIQLG OdUIII lUvv 


14 


10 


4 


o 


Bilirubin 


7 




1 


0 


flhnlpstprnl 

VjI IUICJ LCI Ul 


13 


10 


2 


1 


Proa+ino nhnenhnkmaca 
ui ca lii ie \J \ iuo|Ji iumi laoc 


2 


2 


o 


Q 


Proatinmo 
UI Ca LII III Ic 


21 


19 


2 


Q 


GGT* 


5 


4 




Q 


WunnalhiiminomiP 
ny puaiuui I ill mi I na 


39 


35 


4 


o 


Hyper/hypocalcemia 


A A 


OO 


c 
0 


I 


Hyper/hypoglycemia 


40 


30 


7 


3 


Hyper/hypokalemia 


29 


24 


3 


2 


Hyper/hypomagnesemia 


34 


26 


6 


2 


Hyper/hyponatremia 


26 


24 


2 


0 


Hypertriglyceridemia 


17 


14 


3 


0 


Hyperuricemia 


7 


3 


3 


1 


Hypophosphatemia 


32 


24 


5 


3 


Lipase 


2 


2 


0 


0 


Proteinuria 


4 


2 


2 


0 


Subtotal 


466 


391 


55 


20 


Total 


643 


547 


70 


26 


Percent 




85 


11 


4 



This table shows the true AEs that were missed, misgraded, or correct; 5 labs that were 
incorrectly graded manually as an AE, but the true Grade was 0, are not included here. 
ALT, serum glutamic-pyruvic transaminase; AST, serum glutamic-oxaloacetic transaminase; 
GGT, gamma glutamyl transferase; PTT, activated partial thromboplastin time. 



that the CAIAEGS grades were accurate. Therefore, discrep- 
ancies between the automated and manual approaches were 
attributable to errors made during manual grading, found to be 
inaccurate 15% of the time (96/643, table 2). Seventy laboratory- 
based AEs (11%) were missed by manual grading, and 26 
manually graded AEs (4%) were misgraded (25 understated the 
condition, one was in the wrong direction). 

Of the missed AEs, 86% (60/70) were relatively minor (grade 
1—2). However, 22 severe AEs (grade 3—4) missed detection by 
the manual method, through lack of identification (n=10) or 
incorrect grading to a lower level (n=12). Out of 130 severe 
grade 3—4 AEs identified via CAIAEGS, 17% were missed/ 
misgraded manually. Overall, 40% of patients evaluated (16) 
experienced one or more missed/misgraded severe AEs. 

Figure 3 shows the direction and magnitude of grading error 
for 101 missed/misgraded AEs. The majority involved under- 
reporting; however, in five instances the manually recorded AE 
grade was higher than the true result (recorded as grade 1, true 
grade 0). One misgraded AE (see '*' in figure 3) was recorded at 
the appropriate grade, however the direction was incorrect 
('hyper' when it was actually 'hypo'). 
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Manual Grade 

0 12 3 4 




Figure 3 Missed/misgraded adverse events (AEs) by the manual 
assessment method, against the true grade as detected by CALAEGS; 
dashed boxes highlight the severe (grade 3, 4) missed/misgraded AEs. 
*Misgraded because of wrong direction: term incorrectly identified as 
'hyper' instead of 'hypo'. CALAEGS, Cancer Automated Lab-based 
Adverse Event Grading Service. 

The prospective timed grading evaluation showed that using 
CALAEGS led to time savings in 18/20 paired assessments (90%); 
the average time saved was 5 min 25 s (5:25) per treatment 
course (95% CI 2:24 to 8:26). For two assessments the decision 
support tool required slightly more time (10 s and 2 min). 

DISCUSSION 

Health information exchange systems can substantially impact 
medical quality and safety through automated decision making 
and knowledge acquisition tools. 21 Yet to date the nation's 
healthcare system has fallen far short in applying new tech- 
nology safely and appropriately to enhance the translation of 
new biomedical discoveries into practice. 11 Strategies for AE 
detection that incorporate electronically screened data can cost 
significantly less per AE detected, an attractive improvement 
over pure manual review. 22 

The high prevalence of AEs has made patient safety a major 
concern when treating patients with experimental clinical trial 
agents. 11 Identification of AEs is a major challenge, and effective 
methods for detecting such events are required. 6 23 Because 
laboratory data are computerized, AEs detected through elec- 
tronic surveillance of laboratory results and their normal ranges 
are particularly suited for automated decision support 24 

A very high overall accuracy level was seen in our evaluation 
(18502 correct assessments, 99.5%). Yet the fact remains that 
17% of all severe grade 3—4 AEs went undetected by traditional 
chart review, affecting 40% of patients evaluated. Fortunately, 
a thorough review of the medical records of these 16 patients 
showed that no harm occurred, as in each case concurrent 
medical problems led to appropriate care. However, the potential 
for patient harm certainly exists if severe AEs go undetected. 

Missed/misgraded AEs are concerning not only for patient 
safety, but for overall scientific validity. In phase I studies, dose 
escalation is driven by AEs, such that discrepancies can impact 
study conduct. Comprehensive AE reporting is needed to 
correctly interpret trial results, and avoid under-representing 
toxicity burden. Even low grade AE detection is crucial in 
reporting clinical trials, 1 6 for example, to uncover pharmacoge- 



netic syndromes. While 78% of errors in our evaluation involved 
grade 1—2 AEs, even these reveal critical toxicity patterns prior to 
introducing experimental agents into standard care. 

Although the time savings was less dramatic than we 
expected (~ 5. 5 min per treatment course), even this small 
improvement translates into a potentially large benefit, given 
the volume of laboratory results per protocol (averaging 1800 per 
study in our evaluation). With an average of three courses of 
treatment for 1500 patients accrued annually at COH, even 
modest efficiency improvements have major impact. 

Limitations and future plans 

Due to the large number of laboratory results evaluated, it was 
not possible to directly assess every result for true AEs that might 
have been missed by both the manual and automated methods. 
However, we can reasonably infer that such false negatives are 
highly unlikely based on the testing and validation of the system. 

Achieving the optimal specificity of detection systems often 
still requires some manual review, prompted by the automated 
decision support. 6 CALAEGS prompts such a review when 
additional criteria are required to determine grade (eg, concurrent 
hospitalization or physiological consequences). Therefore 
CALAEGS is an aid to, not a replacement of, human judgment. 

As with any decision support system, there is a potential 
danger when changes to the input data or algorithms occur, 
intentionally or unintentionally. Our domain experts are 
continually vigilant for any changes in laboratory reporting 
standards, and rigorous retesting/validation is performed if the 
algorithms are updated. Recently NCI released CTCAE V.4.0, 
with many more laboratory-based AEs involving qualitative 
criteria. Integration of additional data sources regarding 
patient status is optimal with the advent of CTCAE V.4.0, 
planned for our next system enhancements. The caBIG 
program is developing tools to manage AE collection and 
regulatory/institutional reporting requirements (eg, caAERS); 
integration of CALAEGS with such tools may facilitate accu- 
rate real-time identification of serious AEs that require imme- 
diate reporting. 

Information technology can not only help detect AEs, but also 
facilitate more rapid response once an AE occurs. 11 Currently, 
the COH grading system is used as a data collection tool 
following treatment course completion. We are in the process of 
deploying the system to conduct nightly surveillance of the past 
day's laboratory results, to provide caregivers with refined 
signals indicating worsening patient conditions. Deployment 
will require an appropriate workflow in clinic, and avoidance of 
'alert fatigue' among caregivers. 25 26 Adding a configurable rules 
engine interface to incorporate protocol-specific rules to 'fine 
tune' the algorithms will provide additional efficiency in future. 

Conclusions 

Our evaluation demonstrated that CALAEGS improves accuracy, 
completeness, and efficiency in detecting and grading laboratory- 
based AEs, facilitating documentation of the full toxicity profile 
of experimental agents. With the large number of clinical trials 
performed at centers nationwide, the potential beneficial impact 
on patient safety, efficient resource usage, and unbiased trial 
reporting is tremendous. 
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